Learning to predict synchronization of coupled oscillators on randomly generated graphs

Suppose we are given a system of coupled oscillators on an unknown graph along with the trajectory of the system during some period. Can we predict whether the system will eventually synchronize? Even with a known underlying graph structure, this is an important yet analytically intractable question in general. In this work, we take an alternative approach to the synchronization prediction problem by viewing it as a classification problem based on the fact that any given system will eventually synchronize or converge to a non-synchronizing limit cycle. By only using some basic statistics of the underlying graphs such as edge density and diameter, our method can achieve perfect accuracy when there is a significant difference in the topology of the underlying graphs between the synchronizing and the non-synchronizing examples. However, in the problem setting where these graph statistics cannot distinguish the two classes very well (e.g., when the graphs are generated from the same random graph model), we find that pairing a few iterations of the initial dynamics along with the graph statistics as the input to our classification algorithms can lead to significant improvement in accuracy; far exceeding what is known by the classical oscillator theory. More surprisingly, we find that in almost all such settings, dropping out the basic graph statistics and training our algorithms with only initial dynamics achieves nearly the same accuracy. We demonstrate our method on three models of continuous and discrete coupled oscillators—the Kuramoto model, Firefly Cellular Automata, and Greenberg-Hastings model. Finally, we also propose an “ensemble prediction” algorithm that successfully scales our method to large graphs by training on dynamics observed from multiple random subgraphs.

with step size h = 0.05. Accordingly, an 'iteration' for KM is a single application of the difference equation (2). After a change of time scale, we write X k for the configuration obtained after k iterations. We also consider two discrete models for coupled oscillators. Let Ω = N/κZ be the κ-state color wheel for some integer κ ≥ 3. The κ-color Firefly Cellular Automata (FCA) is a discrete-state discrete-time model for inhibitory Pulse Coupled Oscillators (PCOs) introduced by Lyu 5 . The intuition is that we view each node as an identical oscillator (firefly) of κ-states, which blinks whenever it returns to a designated blinking color b(κ) = κ−1 2 ; nodes with post-blinking color c ∈ (b(κ), κ − 1] in contact with at least one blinking neighbor do not advance, as if their phase update is being inhibited by the blinking neighbors, and otherwise advance to the next color (see Figure 1). More precisely, the coupling for FCA is defined as the following equation For visualization purposes, it is convenient to consider the equivalent dynamics of 'centered colorings'X t := X t − t (mod κ), so that if X t synchronizes, thenX t converges to a constant function. In fact, FCA dynamics are displayed in this way in  On the other hand, the Greenberg-Hastings model (GHM) is a discrete model for neural networks 6 introduced in 1987 by Greenberg and Hastings, where nodes of color 0 and 1 are called 'rested' and 'excited', respectively, and of any other color called 'refractory'. As in biological neural networks, rested neurons gets excited by neighboring excited nodes, and once excited, it has to spend some time in rested states to come back to the rested state again. More precisely, the coupling for GHM is defined as if X t (v) = 0 and X t (u) = 1 for some u ∈ N(v) X t (v) + 1 otherwise.
For GHM, note that X s is synchronized if and only if X t ≡ 0 for all t ≥ s + κ.
In all experiments in this paper, we consider κ = 5 instances of FCA and GHM. From here and hereafter, by FCA and GHM we mean the 5-color FCA and 5-color GHM, respectively. While all three models tend to synchronize locally their global behavior on the same graph and initial configuration evolve quite differently, as seen in Figure 2. There, we simulate each system on the same graph, a 20x20 lattice with an additional 80 edges added uniformly at random. A single initial phase concentration X 0 is chosen by assigning each node with a uniformly randomly chosen state from {0, 1, 2, 3, 4}. This initial configuration is evolved through three different models: FCA, GHM and KM.
This suggests that it is not feasible to predict synchronization for all three dynamics in the same way. Furthermore, the inclusion of random edges to the square lattice may disrupt some well-known behavior (e.g., spiral waves for coupled oscillators in 2D lattice 7 ) and result in unpredictable dynamics, analytically so. Hence, predicting synchronization on fully heterogeneous sets of graphs is a difficult task since keeping track of local interactions of oscillators is challenging to do when the overall structures of graphs within the heterogeneous set correspondingly have highly irregular and diverse couplings.

A simple lemma about Greenberg-Hastings model on a ring
Lemma. Consider the κ-color Greenberg-Hastings model (4) on a path of n nodes. Then for arbitrary initial coloring, κ ≥ 3, and n ≥ 1, the system will always converge to an all-zero configuration in n + κ iterations.
Proof. We use a back-tracking argument that was used to analyze FCA in 5,8,9 . Order the nodes in the path as 1, . . . , n from left to right so that nodes that differ by 1 are adjacent. Let X t : {1 . . . , n} → {0, 1, . . . , κ} denote the color configuration at time t. Fix T > κ and suppose that X T is not identically zero. Then necessarily X T −κ should have at least one 1.
Suppose that node i has color 1 at time T − κ. Then by the dynamics, one of its two neighbors (i − 1 or i + 1) should have color 1 at time T − κ − 1. Suppose that node i + 1 has color 1 at time T − κ − 1. By a similar argument, one of its two neighbors, now i or i + 2, should have color 1 at time T − κ − 2. But since X T −κ−1 (i) = 1, we should have X T −κ−2 (i) = 0, so we must have X T −κ−1 (i + 2) = 1. Repeating this argument, we see that backtracking in time, the color 1 at node 0 at time T − κ should travel at constant unit velocity, that is, X T −κ−s (i + 1 + s) = 1 for all s such that i + 1 + s ≤ n. It follows that X T −κ−(n−i−1) (n − 1) = 0 and X T −κ−(n−i−1) (n) = 1. But this yields a contradiction since then at time T − κ − (n − i), node n has color 0 and its only neighbor n − 1 cannot have color 1 to excite node n. By a similar argument, if node i − 1 has color 1 at time T − κ − 1, then the dynamics can only be back-tracked through time T − κ − i + 1. This shows that X T −κ cannot have color 1 if T − κ > n. We conclude that that X T should be identically zero of T > κ + n. .

Machine learning algorithms for binary classification
Different machine learning algorithms were employed to solve the problem of predicting whether or not a given graph and initial coloring for a system of coupled oscillators will tend to a synchronizing or non-synchronizing trajectory.
Random Forest (RF) 10 A random forest (RF) is a form of ensemble learning that produces decision trees, and in which these trees themselves are generated using random feature vectors that are sampled independently from a distribution.
Our implementation imposes a limit on the maximum amount of features used per tree to be the square root of the total amount of features in our dataset, √ p (where p is the number of features), and uses 100 estimator trees with iterates terminating at complete node purity.
Gradient Boosting (GB) 11 Gradient boosting (GB) is an ensemble learning algorithm for classification tasks similar to RFs where a loss function is minimized by a collection of decision trees to form a strong learner from a group of weak learners.
The main difference is that the trees are not trained simultaneously, as in RFs, but added iteratively, so that the current loss is reduced cumulatively so in each iteration for each tree.
Our implementation uses a learning rate of 0.4, 100 estimator trees, and the square root of total number of features per tree when searching for splits.
Feed-forward Neural Networks (FFNN) 12 Feedforward Neural Networks (FFNN) form a class of function approximators consisting of multiple layers of connected linear and non-linear activation functions 13,14 . The linear maps are parameterized by 'weights', which are subject to training on a given dataset. The algorithm works by iterating two phases. The first phase is the 'feed-forward' phase; the input is mapped to an output by applying the layers of linear and non-linear activation functions with the current weights. The loss between the output and the target labels are then computed. The second phase is the 'back-propagation' phase; the weights are modified in the direction that reduces the computed loss from the forward propagation phase.
Our implementation of an FFNN uses a learning rate of 0.01, cross-entropy loss, 4 fully connected linear layers, a hidden layer size of 100, ReLU activation, batch normalization, batch size 256, and dropout of 0.25 probability across 35 epochs. See 15 for backgrounds and jargons.
GraphLRCN A Long-term Recurrent Convolutional Network (LRCN) 16 is a neural network architecture that has been developed for video data classification by combining convolutional neural networks 17 and a special type of recurrent neural network known as the Long Short-Term Memory network (LSTM-Net) 18 . We propose a variant of LRCN that is suitable for learning to predict dynamics on graphs, which we call the GraphLRCN. The idea is to encode each configuration X t on a graph G as a weighted adjacency matrix ∆(X t ) of G. Then the dynamics (X t ) 0≤t≤r can be turned into a sequence of square matrices, which can be viewed as video data subject to classification.
Here we give a precise definition of the encoding X t → ∆(X t ). Given a κ-coloring X on a graph G = (V, E), we define the associated color displacement matrix ∆ = ∆(X) ∈ (Z κ ) |V |×|V | as If X : V → [0, 2π) is a configuration for KM, we define the associated color displacement matrix ∆(X) similarly as above by replacing mod κ with mod 2π. One can think of ∆(X) as the adjacency matrix of G weighted by the color differences assigned by X on the edges. An important feature of this encoding is that ∆(X)(i, j) is nonzero if and only if nodes i, j are adjacent in G and have different colors X(i) = X( j). For instance, if X is synchronized then ∆(X) is the zero matrix. Additionally, since the convolutional block components learn physical location-based associations in the matrix, we applied the reverse Cuthill-Mckee algorithm 19 on the sequences of adjacency matrices to reduce our matrix bandwidths and augment these physical associations in our representations. In combination, the techniques mentioned are intended to reduce the amount of unnecessary information and noise in the encoding for the purpose of learning to predict synchronization. Our implementation uses a learning rate of 0.01, a batch size of 2 11 , and batch normalization across 25 epochs.

3/7
Additional figures Figure S3. Supplementary experiment to Figure 5 in the main text to show sensitivity of our synchronization prediction methods to the size of training dataset. All settings are identical to the one in Figure 3 in the main text, except here we use smaller subsampled training set of sizes 10K and 40K for each classes of synchronizing and non-synchronizing examples, averaged over five runs. The first two columns use datasets with 15 node graphs, while the last two columns use datasets with 30 node graphs. Note that the full dataset used in Figure 3 in the main text uses 80K examples. Even trained with these smaller training set, we observe that most of our prediction algorithms significantly outperforms the baseline, although the performance does decrease in comparison to the main experiment in Figure 5 in the main text. (Note: This was averaged across 5 runs to address the randomness associated with subsampling).